Hardware-oriented Cache Management for Large-scale Chip Multiprocessors
نویسنده
چکیده
One of the key requirements to obtaining high performance from chip multiprocessors (CMPs) is to effectively manage the limited on-chip cache resources shared among co-scheduled threads/processes. This thesis proposes new hardware-oriented solutions for distributed CMP caches. Computer architects are faced with growing challenges when designing cache systems for CMPs. These challenges result from non-uniform access latencies, interference misses, the bandwidth wall problem, and diverse workload characteristics. Our exploration of the CMP cache management problem suggests a CMP caching framework (CC-FR) that defines three main approaches to solve the problem: (1) data placement, (2) data retention, and (3) data relocation. We effectively implement CC-FR’s components by proposing and evaluating multiple cache management mechanisms. Pressure and Distance Aware Placement (PDA) decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. Flexible Set Balancing (FSB), on the other hand, reduces interference misses via extending the life time of cache lines through retaining some fraction of the working set at underutilized local sets to satisfy far-flung reuses. PDA implements CC-FR’s data placement and relocation components and FSB applies CC-FR’s retention approach. To alleviate non-uniform access latencies and adapt to phase changes in programs, Adaptive Controlled Migration (ACM) dynamically and periodically promotes cache blocks towards L2 banks close to requesting cores. ACM lies under CC-FR’s data relocation category.
منابع مشابه
The MINC (Multistage Interconnection Network with Cache Control Mechanism) Chip
Although bus connected multiprocessors have been widely used as high-end workstations or servers, the number of connected processors is strictly limited by the maximum bandwidth of the shared bus. Instead of them, a switch connected multiprocessor which uses a crossbar or Multistage Interconnection Networks(MINs) for connecting processors and memory modules is a hopeful candidate. However, in s...
متن کاملCache Equalizer: A Cache Pressure Aware Block Placement Scheme for Large-Scale Chip Multiprocessors
This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large scale chip multiprocessors (CMPs). Our work is motivated by large asymmetry in cache sets usages. CE decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. Temporal pressure at the on-chip last-level cache, is contin...
متن کاملC-AMTE: A location mechanism for flexible cache management in chip multiprocessors
This paper describes Constrained Associative-Mapping-of-Tracking-Entries (C-AMTE), a scalable mechanism to facilitate flexible and efficient distributed cache management in large-scale chip multiprocessors (CMPs). C-AMTE enables fast locating of cache blocks in CMP cache schemes that employ one-to-one or one-to-many associative mappings. C-AMTE stores in percore data structures tracking entries...
متن کاملCompiler and Hardware Support for Cache Coherence in Large-Scale Multiprocessors: Design Considerations and Performance Evaluation
In this paper, we study a hardware-supported, compiler-directed (HSCD) cache coherence scheme which relies mostly on compiler analysis, but which also needs a reasonable amount of hardware support. Such a scheme can be implemented on a large-scale multiprocessor using oo-the-shelf microprocessors such as the Cray T3D, and can be adapted to various cache organizations including multi-word cache ...
متن کاملHardware and Compiler-Directed Cache Coherence in Large-Scale Multiprocessors: Design Considerations and Performance Study
ÐIn this paper, we study a hardware-supported, compiler-directed (HSCD) cache coherence scheme, which can be implemented on a large-scale multiprocessor using off-the-shelf microprocessors, such as the Cray T3D. The scheme can be adapted to various cache organizations, including multiword cache lines and byte-addressable architectures. Several system related issues, including critical sections,...
متن کامل